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ABSTRACT 

Classifying large quantities of multidimensional data (e.g., 
remotely sensed agricultural data) (Remote, 1968) requires effi- 
cient and effective classification techniques and the construction 
of certain transformations of a dimension-reducing, information- 
preserving nature. This paper will deal with the construction of 
transformations that minimally degrade information (i.e., class 
separability). We will only consider the construction of linear 
dimension-reducing transformations for multivariate normal popu- 
lations and information content will be measured by divergence 
(Kullback, 1968). \ 

1. INTRODUCTION 

% ..I ■ ■ I ■ ■ 

For n-dimensional normal classes N(m^,V^) i = l,...,m, the 
divergence between class i and j (Kullback, 1968) is given by 
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1 




Let 


°1J “ ^r[(V^-Vj)(v’^-v'^)] + ^r[(v“^+ V’^)(6^j)(6^^)^] 

- ^ Vlj^^ - "• 

The Interclass divergence (Decell and Quireln, Oct. 1973) for m 
populations is given by 


and It follows that 


=il - = 


(V^ . 


m(m-l) 


(m~l) 


where 






If B is a k X n rank k matrix, the B-lnterclass diver- 
gence (Decell and Quireln, Oct. 1973) is given by 


Db(1,J) 
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Db = I trl^ (BS^b'*^) ] - k. 

As In the case of average Interclass divergence, the B-lnterclass 

divergence Is a measure of the "separation" In the classes 
T 

N(Bm^,BV^B ) 1 = l,...,m, and Is a useful tool for constructing 

rank k linear transformations that preserve "class separability". 

It has been shown (Decell and Qulreln, Oct, 1973) that whenever 

D = D , the probability of mlsclasslflcatlon (Anderson, 1958) for 
® T 

the classes N(Bm^,BV^B ), 1 = l,,.,,m Is the same as the probabili- 
ty of mlsclasslflcatlon for the classes N(m^,V^), 1 = l,,.,,m, 

2, THEORETICAL PRELIMINARIES 

We will assume that k Is an Integer (k < n) and develop a 
procedure for selecting a k x n rank k matrix B such that D Is 

D 

maximum. The procedure will be based upon the following theorem 
(Decell and Smiley, to appear). We will let C = {u e R^; ||u||=l} 
and T(H) = |h = I-2uu ; u c| denote the set of Householder 
transformations defined on (Householder, 1968), 

Theorem, For each positive Integer 1 let H^ e T(H) be Inductive- 
ly chosen such that 



( 3 ) 




(4) D, 


’<\l")»i«l-l---*'l-(p-l)™l-(p+l)-”«l ■ “<\|Z)Hi+i---Hi. 
fcr every H e T(H), “ 0,1,..., t-2. 

(5/ The monotone sequence 

{D )“ = {D.^ Is bounded above, 

1=1 ”l 1=1 


and hence 

11m D 


u ” l.u.b. {D 

l->oo ''^k'^^”l*“”l 1 '‘k'“'“l "1 


(L 1Z)H •••H, 


}. 


We would, of course, be pleased if It were the case t-hat 
l.u.b. ^0^ |z)'h •••H ^ 31\ls, unfortunately. Is not always 

the case for some choice of k < n and is not possible, in general, 
for any k < n. We do know that there is some k x n rank k 
matrix' B for which is maximum and, in general, that 4 D 

(Decell and Quirein, Oct. 1973). It follows, moreover, that since 
the matrices of the form (Ij^| Z)H^* • have rank k. 

We will call the sequence (D,, „ r suboptlmal 

(IklZ)Ri---Hi 

whenever 


l.u.b. „ } < Dg 

1 k* i 1 


(and optimal in the case of equality) . 

There are several open theoretical questions that deal with 
the conjecture that the sequence is, in general, optimal and co- 
flnally constant beyond the index i *= min{k,n-k} (Decell and 
Smiley, to appear). In what follows we will develop a procedure 
for constructing the subject sequence and demonstrate its 
application to agricultural data. 
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3. THE GRADIENT OF D„ 

, ' ' B 

It has been shown (Qulrein, Nov. 1972) that the differential 

dD- of (regarded as a function of the k x n matrix J) can 

be expressed in the form dD “ F + G, where, when the indicated 

B 

inverses exist, 


F 


^r(^(BV^B'^)"^(dB S^b’’ 




(dB SiB'*^) (BViB^) 


Tn-1 


|tr[^(BS^ dB^KBV^B’^) 


+ BS^dB^) ] 

1 



(dB s^b'^Xbv^b’^)"^] 


and 


" itr[^ (BV^B^)"^(dB V^B^ + BV^dB*^) (BV^b’’^)"^(BS^B^) ] 

- v^b’^)(bv^b'^)"^(bs^b'^)(bv^b'^)"^] 

- (BV^B'‘’)“^(BS^B’*’)(BV^B'*^)“^(BV^dB’^)] 


- - tr 


(dB V^b''’)(BV^b'‘’)"^(BS^b’*^)(BV^b''’)"^ ]. 
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1 
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Thus, 


dDfl “ tr[V dBrs^B**’ - V^b'^(BV^b'^)"^(BS^b'^) } (BV^B*^)"^] 


tr 




where 

Qi = [{s^B*^ - v^b'^(bv^b'^)“^(bs^b'^)}(bv^b'^)"^]. 

We are, of course, interested in extremizing D„ over the 

D 

particular subclass of k x n rank k matrices of the form 
(Ij^|z)H where H G T(H) (e.g., for i = 1 we find that maxi- 
mizes | 2 )jj Actually, one need only consider what is re- 
quired to compute The computation of H 2 is accomplished by 

the same procedure as that for It is simply a matter of, 

after selecting H^, redefining the m classes to be 
N(Hj^m^,H^V^H^) , 1 = l,...,m and proceeding as in the selection of 

Hj. 

With these facts in mind we will simply calculate the gra- 
dient of Dg where B is restricted to having the form 
B = (Ij^|z)H, H G T(H). The restrictions H G T(H) can be accom- 
plished by considering those k x n rank k matrices of the fotm 

T 

B = (Ij^lZ)d - 2 w G r"(w 0 ) 

w w 


It follows that 

% 

dB 


ww 


d[(I,JZ) (I - 2 ^)] 


T 

w w 


-2 ( I.^ I Z ) d (ww^/w^w) 


''fl 1"^ rW^wd(ww^) - ww^d(w^w) 

I -^—2 ] 

(w w) 


I 


I 


I 


2dv|Z) 


T T TT Tvi 

w — r[w w(dw w + wdw ) - ww (w dw + dw w)J 

(w w) 


2(i^|z) 

(w’^w)^ 


TT TT TT TT 

[dw w w w +wwwdw - wwdww - wdwww] 


2(1, Iz) T T T T T 

_ — _[ (dv w - wdw )ww - ww (dw w - wdw ) ] 

(w w) 


Substituting the latter in the expression for dD^, 


dDg « tr 


tr 


P- 

P- 


2(1, |Z) ^ T T T T T i 

= — j { (dw w - wdw )ww - ww (dw w - wdw ) }Q ] 

(w w)^ 

2Q^(Ij^|Z) T T T T T 

= — = — { (dw w - wdw )ww - ww (dw w - wdw ) } ] 

(w w) 


tr 


(w^w) 


^ (Ij^lz)(dw w^ - wdw^) 


- Z)ww^(dw w^ - wdw^)] 


-2 


(w'^’w)^ 


tr 


[M^dw w^ - M^wdw^ - N^dw w^ + N^wdw'^] 


Where \ “ ww’*^Q^(Ij^l Z) and = Q^(Ij^| Z)ww^. 


— Ti- tr[/ {w M. dw - w N. dw + N. w dw - M, w dw }] 

® (w^w)2 M ^ ^ i ^ 


2 tr[^^{dw* w - dw^ w + w dw^ - w ciw*^}] 


(w*W) 


I 


I 


1 


I 


dD 


-2 


» (w"^w)2 


-2 


/ T ^2 

(w w) 


tr 


tr 


[> {M^ w dw^ - N. w dw^ + N. w dw^ - M w dw^}] 

1 X 1 i 

- (M^ - N^))w dw’^]. 


The necessary condition that w be extremal Is then. 


G(w) 


-2 


T 2 
(w w) 




(M^ - N^))w = e 


(the zero vector) . 


We note that G(w) 


is the gradient of 


“(IJZHI - 2-””-' 


T ' 
w w 


use a steepest descent procedure for finding the extremal w. The 
process is repeated for each sequential index until corresponding 
values of divergence "‘•tabilize. " Test results are presented in 
the following tables. The C-1 flight line data is twelve channel 
data for nine agricultural classes: soybeans, com, oats, red-_ 
clover, alfalfa, rye, bare soil, and two types of wheat. The Hill 
County data is sixteen-channel data for five agricultural classes: 
winter wheat, fallow crop, barley, grass, and stubble. 

The starting value w^ for the steepest descent procedure 

for selecting each successive Householder transformation 

11 IT' 

was arbitrarily chosen to be w = ( — , — , — ) . 

^ ^ ^ ° ^ ^ ^ 
Choosing starting values in this arbitrary fashion is certainly 

not the most clever thing to do in the presence of the monotone 

behavior of the sequence jz)H •••H ' would expect, for 

example, that the starting values for tU-’i selection of 
should depend upon the unit vectors previously selected as gener- 
ators of in such a way as to guarantee that the 

starting value w^, for the descent procedure for selecting 


X 1 f 


satisfies 




w w 

-2-2-)h « 
T ' i 
w w 
o o 


■«r 


This rather rrbltrary selection of the starting vector does, as 
the examples demonstrate, violate the latter Inequality. The 
question about how to choose starting vectors, according to the 
latter inequality, is still an opt;n one and its answer would cer- 
tainly decrease computation time. 


C-1 Flight Line Date 
n=12, k*6, m=9, D=10,660 


Iteration for 


* 

No 

Divergence D 

O 

1 . 

1982 

2 

3536 

3 

4533 

4 

5781 

5 

6910 

6 

7522 

7 

7710 

8 

7790 

9 

7838 

10 

7865 

11 

7881 

12 

7892 


Hill County Data 
n=16, k=8, m=5, D=636 


Iteration for 


* 

No 

Divergence D^ 

1 

114.58 

2 

136.66 

3 

152.27 

4 

179.69 

5 

223.81 

6 

247.42 

7 

252.78 

8 

257.12 

9 

260.74 

10 

263.95 


1 


^Iteration counter 
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